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ABSTRACT 

We present a full Bayesian algorithm designed to perform automated searches of the 
parameter space of caustic-crossing binary-lens microlensing events. This builds on 
previous work implementing priors derived from Galactic models and geometrical con- 
siderations. The geometrical structure of the priors divides the parameter space into 
well-defined boxes that we explore with multiple Monte Carlo Markov Chains. We 
outline our Bayesian framework and test our automated search scheme using two data 
sets: a synthetic lightcurve, and the observations of OGLE-2007-BLG-472 that we 
analysed in previous work. For the synthetic data, we recover the input parameters. 
For OGLE-2007-BLG-472 we find that while \ 2 is minimised for a planetary mass- 
ratio model with extremely long timescale, the introduction of priors and minimisation 
of BIC, rather than % 2 , favours a more plausible lens model, a binary star with com- 
ponents of 0.78 and 0.11 M Q at a distance of 6.3 kpc, compared to our previous result 
of 1.50 and 0.12 Mq at a distance of 1 kpc. 
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1 INTRODUCTION 

Gravitational microlensing (|Einsteinl Il936h is a well- 
established technique to detect extrasolar planets (e.g. 
Mao fc Paczvnskilll99ll , iBeaulieu et~afl l200rj . iMuraki et at] 
20111 ). and is complementary to other methods, being able 
to probe low-mass cool planets that are inaccessible to them 
from the ground. This allows us to carry out statistical stud- 
ies of pla nets of all masses l ocated at a few AU from their 
host star (jCassan et alJl2012T ). Microlensing occurs when one 
or several compact objects are located between a source 
star and the observer, leading to a gravitational deflection 
of the light from the source star by the "lens" objects. As 
the source and lens move in and out of alignment, this de- 
flection is observable in the form of a simple characteristic 
brightening and fading patt ern when the lensing object is a 
single star (|Paczvnskilll986T ). but takes a much more com- 
plex form when the lens is made up of more than one ob- 
ject. When that happens, the lightcurve typically features 
"anomalies", which can be modelled to determine the na- 
ture of the lensing system. One of the configurations that 
can lead to anomalies is when the lensing system contains 
one or more planets. In order to determine the properties of 
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these planets, the anomalies must be analysed through de- 
tailed modelling; this paper is concerned with cases where 
the lens consists of two components. 

Analysing anomalous microlensing lightcurves can be a 
significant computational challenge for a number of reasons. 
The calculation of a full binary-lens lightcurve, including the 
effects of having an extended source, is an expensive process 
computationally, and the parameter spa ce to be explored is 
complex, with several degeneracies (e.g. iKubas et al.ll2005l ). 
This is the case even when second-order effects, such as that 
of parallax due to the Earth's orbit or orbital motion in the 
lensing system, are ignored. 

A significant number of the ~ 1500 microlensing events 
now being discovered by survey teams in a season exhibit 
anomalies due to stellar or planetary companions to the lens 
star. Many of these are caustic-crossing events in which the 
lightcurve exhibits rapid jumps, brightening when a new pair 
of imag es forms and fa ding when two images merge and dis- 
appear. ICassanl (120081 ) introduced an advantageous parame- 
terisation for caustic-crossing events by linking two param- 
eters, ti n and t ou t, to the caustic-crossing times and two pa- 
rameters, Sin and Sout , to the ingress and egress points where 
the source-lens trajectory crosses the caustic curve. These 
parameters make it easier to locate all possible source-lens 
trajectories that fit the observed caustic-crossing features. 
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iKains et all (|2009h used the ICassanl (|2008l ) parameters 
to analyse the observed lightcurve of the microlensing event 
OGLE-2007-BLG-472, which exhibits two strong caustic- 
crossing features separated by about 3 days. This short 
duration suggested that the anomaly could be due to the 
source crossing a small planetary caustic, motivating de- 
tailed modelling to rule out alternative binary star lens mod- 
els. The lowest-x 2 model has a planetary mass ratio, but 
an extremely long event timescale, £e ~ 2000 days, much 
longer than the 2-200 day range typical of Ga l actic B ulge mi- 
crolensing events. On this basis IKains et all (|2009T ) rejected 
the global \ 2 minimum by placing an ad-hoc 300 day cutoff 
on £e , and suggested that a Bayesian approach including ap- 
propriate priors on all the parameters would more naturally 
shift the posterior probability to local \ 2 minima with less 
extr eme paramete r s. 

ICassan et al.l (|2010T l derived analytic formulae for the 
prior 7r(si n ,s out ) corresponding to a uniform and isotropic 
distribution of lens-source trajectories, which are specified 
by an angle a and impact parameter uq. A suitable prior 
on is arises by using a model of microlensing in the Galaxy 
to determine distributions for the lens and source distances 
and their relative proper motion, or alternatively by using 
a parameterised model fitted to the observed distribution 
of <e among all the events found in the microlensing sur- 
vey. In either case a prior on is effectively penalises very 
long and very short events, lowering the posterior probabil- 
ity of the tE ~ 2000 d global x 2 minimum found for OGLE- 
2007-BLG-472 and favouring local minima with more typ- 
ical event timescales. Priors on other parameters can also 
be derived from models of stellar populati on synthesis such 
as the Besangon model |Robin et al.ll2003l ). which we use in 
this work. 

In this paper we develop further the Bayesian analysis 
of caustic-crossing events, exploiting intrinsic features of the 
7T (sin, Sout) prior to specify and test a procedure suitable 
for automatic exploration of the full parameter space. We 
test the procedure using synthetic lightcurve data, and we 
re-analyse the OGLE-2007-BLG-472 data to compare the 
results of maximum likelihood analysis (x 2 minimisation) 
with the full Bayesian analysis including appropriate priors. 



2 BINARY-LENS MICROLENSING 

In the context of microlensing, caustics are locations in the 
source plane, behind the lens, where the magnification is in- 
finite. A point-mass lens produces a point caustic directly 
behind the lens where formation of an Einstein Ring gives 
infinite magnification for a point source, or very large mag- 
nification for a finite size source. The point-lens gives a 
symmetric magnification pattern A(u), with u the projected 
source-lens distance in the source plane, in units of the Ein- 
stein Ring radius. The linear source trajectory has impact 
parameter uo and a timescal e tE, both exp ressed in units of 
the angular Einstein radius (|Einsteinlll936T l. 

where M is the lens mass, and Ds and Dl are the distances 



to the source and the lens respectively. This produces a sym- 
metric lightcurve with magnification A(u(t)) peaking at Ao 
at time to. Thus 3 parameters, uo, to, and ts define the 
shape of a point-source point-lens lightcurve. Finite source 
effects alter the peak of the lightcurve when uo is of order 
p* — 6+/0E, the source star radius in Einstein radius units. 

With two or more lens masses, the simple point caustic 
becomes a more complex set of closed curves consisting of 
concave segments joining at cusps, the shapes and locations 
depending on the lens masses and locations. Microlensing 
lightcurve anomalies, relative to the point-lens model, arise 
from the asymmetric magnification pattern associated with 
these caustic curves. The source trajectory may pass nearby 
to a cusp, causing a bump in the lightcurve, or cross over 
a caustic curve, resulting in a variety of complex lightcurve 
features, depending on the exact lens geometry. For a static 
binary lens, the caustic pattern depends on the mass ratio 
q and separation d between the two lens masses. The source 
trajectory relative to the caustics is specified by the impact 
parameter uo relative to the centre of mass of the lens, and 
the trajectory angle a relative to the line connecting the two 

lens masses. 

As emphasised by ICassanl (|2008h and IKains et all 

(|2009l ). for caustic-crossing events the standard parameter- 
isation makes it very difficult to conduct a systematic ex- 
ploration of the parame ter spac e . The alternative parame- 
terisation formalised by ICassa n (2008) replaces (uo, a, to, 
tE, p*) by equivalent parameters (s in , s ou t, tin, t out , At cc ) 
that are more closely related to observable lightcurve fea- 
tures, and therefore better constrained by observations. Of 
the "standard" binary-lens microlensing parameters, two are 
retained: the mass ratio of the lens components q (< 1), and 
thei r projected s e parat ion d. 

IKains et al.l |2009r i show that the alternative param- 
eters are better suited to fitting caustic-crossing event 
lightcurves, finding models that are widely separated and 
easily missed with the standard binary-lens parameteri- 
sation. However, the global x 2 minimum found in the 
IKains et all i|2009t ) analysis of OGLE-2007-BLG-472 was a 
model with a "planetary" mass ratio q ~ 10~ 4 , but with an 
extremely long timescale, tE ~ 2000 days. This model was 
rejected through a qualitative discussion with the expecta- 
tion that a Bayesian analysis would more naturally shift the 
best fit to a different local \ 2 minimum with less exotic 
parameters. In this paper, we add priors on relevant param- 
eters, attempting to remove the need for such qualitative 
arguments by using a badness-of-fit statistic that includes 
both the likelihood and additional terms originating from 
prior information. 

3 DATA 

3.1 Synthetic Lightcurve Data 

To test our automated algorithm, we generated a data 
set using the parameters given in Table [T] selected to re- 
produce features seen in observed anomalous microlensing 
lightcurves. The chosen parameters correspond to a caustic- 
crossing binary-lens event with crossings separated by 7 days 
and occurring near the lightcurve peak (Fig. [TJ. 

For an observation at time i;, when the source is mag- 
nified by a factor A(U), the true model magnitude is 
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Units 


to 


5503.6 


MHJD 


'e 


27.2 


days 


Q 


1.68 


rad 


»() 


0.1 




P* 


0.003 




d 


1.22 




g 


0.08 




g = Fb/Fs 
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Table 1. Standard binary-lens parameters used to generate our 
synthetic data. 



Hi = -2.51ogio(-F s A(U) + F B ) , 



(2) 



where the un-magnified source flux Fs was chosen to be 1/5 
of the blend flux Fb, which represents un-magnified stars 
that are blended with the microlensing target. The source- 
lens trajectory's impact parameter uo —0.1 is small enough 
to reach magnification A ~ 10 near the closest approach at 
time to. 

We obtain synthetic magnitude data mi by using a 
pseudo-random number generator to sample a Gaussian dis- 
tribution with mean [m, and standard deviation en, given by 



0.01 



1 + | mo - fu 



(3) 



where mo = — 2.5 logio (-Fs + -Fb) is the baseline magnitude, 
corresponding to the un-magnified source flux plus the blend 
flux. The fractional error bars are thus 1% at the baseline 
and decrease when the source is magnified. After generat- 
ing the synthetic magnitude data, we re-scaled these error 
bars to obtain a x 2 of 1 per degree of freedom for the true 
model. This approximates the common practice of rescaling 
the nominal error bars when fitting to observed microlensing 
lightcurves. 

We employed a non-uniform cadence emulating a typ- 
ical microlens observing strategy. We start with a baseline 
cadence of one observation per night, increasing to 3 obser- 
vations per night as the event nears the peak predicted by a 
point-source point-lens (PSPL) fit to the earlier data. When 
the anomaly is detected, i.e. when the synthetic lightcurve 
data departs significantly from the PSPL fit, the cadence 
increases to 5 observations per night. From the resulting 
lightcurve, a random sample of N points is selected to em- 
ulate data loses, e.g. due to bad weather or technical issues. 

The resulting synthetic lightcurve, retaining 199 data 
points, is shown on Fig.[T] A plot of the true model lightcurve 
with the parameters given in Table [T] is shown on Fig. [2] 

3.2 OGLE-2007-BLG-472 

This event was alerted during the 2007 microlensing observ- 
ing season by the OGLE collaboration, and followed up from 
two observing sites by the PLANET collaboration. It was 
used as the test event by Kains et al. (2009, see that paper 
for full details on the data sets) to illustrate the capabilities 
of a modelling scheme based on the parameters defined by 
ICassanl (|2008T ). We use the same event here for comparison 
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Figure 1. Synthetic data used in this paper, plotted with 1-cr 
error bars calculated using Eq. Inl- 



and in particular to show that the full Bayesian analysis 
shifts the posterior probability away from the exotic param- 
eters found in the previous maximum likelihood analysis. 



4 BAYESIAN FRAMEWORK 

Our analysis implements a Bayesian framework for fitting 
microlensing events involving caustic crossings. For M pa- 
rameters 9 and data D, the posterior probability density in 
the M-dimensional parameter space is 



P(0\D) = 



JP(D\9)n(8) d M 8 



(4) 



Here n (8) is the prior on the M parameters and P(D\9) is 
the likelihood, e.g. for Gaussian errors with known standard 
deviations <ta the likelihood is 



L{8) oc P(D\6) = 



exp{-§ X 2 (a,P)} 
Z D 



with 



1=1 



where fu(8) is the model prediction for data Di, and 



Z D = (2 7T 



N/2 



(->) 



(6) 



is a measure of the iV-dimensional volume admitted by the 
data. 

In fitting the binary lens model to microlensing 
lightcurve data, we project the posterior distribution in the 
full M-dimensional parameter space onto the (d, q) plane, a 
process known as marginalising over the m = M — 2 nuisance 
parameters, which we denote collectively by f3: 



P(d,q\D) = / P(d,q,P\D)dTP 

= n(d,q) jP(/3\D,d,q)d m l3 



(8) 



where 7t (d, q) is the prior distribution on the (d, q) plane. We 
take 7T (d, q) to be uniform in log d and log q. This choice 
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Figure 2. Top: Synthetic data and true model lightcurve, gener- 
ated with the parameters given in TablelTI Bottom: The (si n , s ou t) 
prior map, with the location of the true model shown with a filled 
yellow circle. 



comes from the fact that the sizes of the caustics behave like 
power laws of d and q. 

We then marginalise over nuisance parameters by sim- 
ply averaging over the samples of o ur Markov Chain Mo nte 
Carlo algorithm (MCMC, see e.g lGelman et all Il995l for 
more background information on this), 



/ 



X(e)P(f3\D,d,q)d m (3^ (X) 



(9) 



where X(0) is any function of the parameters, and we use 
the notation (X) to refer to a simple unweighted average 
over the MCMC samples. The result is a map of the poste- 
rior probability distribution P(d, q\D). We will find that the 
maximum aposteriori (MAP) estimates of (d, q), which max- 
imise P(d,q\D), can be quite different from the maximum 
likelihood (ML) estimates, which maximise P(D\q,d, /3), or 
mimimise \ 2 ■ 



4.1 Feature- based Parameters and Structure of 
the Prior tv (s 

in , Sout ) 

The benefit of using the ICassanl (^OOct ) parameters 
(sin, Sout, tin, tout, At cc ), rather than the standard parame- 
ters (uo, a, t E , to, p+), is two-fold. First, the caustic-crossing 
time parameters ti n , tout and At cc can often be tightly con- 



strained by features in the observed lightcurve. Second, the 
(■Sin, Sout) parameters bring together onto a compact square 
all models that have caustic crossings at those times. In 
contrast, with the standard (uo, a, tE, to, p*) parameters, the 
models with caustic crossings at times ti n and t ou t are widely 
separate d and difficul t to lo cate. 

The iKains et al.l ([2009) analysis us ed a genetic a lgo- 
rithm and assumed uniform priors on the[c assanl (|2008T ) pa- 
rameters. This has obvious problems: for example, since the 
caustic folds that make up a caustic structure are concave 
(see e.g. upper panel inset of Fig. [2}, a linear source trajec- 
tory cannot enter and then exit a caustic along the same 
caustic fold. This needs to be reflected in suitable priors 
on the corresponding parameters, in this example, on the 
(■Sin, s out ) parameters, which determine where the source- 
lens t£ajectOTy_CTOsses_the caustic folds. 

ICassan et aT] (|201Ct ) derived analytic formulae for 
the prior 7r (si n , s ut | tin, tout, At cc ), hereafter shortened to 
7r(sin, Sout), corresponding to a uniform isotropic distribu- 
tion of source-lens trajectories, and introduced also a prior 
7T (tE) on the event timescale, showing how 7r (£e) effectively 
modifies n (si n , s out ). The analytic prior is proportional to 
the Jacobian of t he transformation between the standard 
and ICassanl (|200ST ) parameters, 



.7 



d(uo,a,t E ,t , p*) 



in 7 <5 ut 3 ^in •, ^out j A^cc) 



(10) 



ICassan et al.l (fioiol 'l evaluated this Jacobian to find the an- 
alytic form of 7r(sin,s ou t) corresponding to uniform priors 
on all standard parameters. 

As can be seen in e.g. Fig. [2] the prior n (si n , s ou t) covers 
a compact square, since Si n and s out run over the same range 
as we move around the closed caustic curve. The square 
naturally sub-divides into "sub-boxes", the boundaries of 
which correspond to the cusps. For a caustic with iV c cusps, 
there are thus N% sub-boxes. However, the sub-boxes on 
the anti-diagonal of the (s- m , s ou t) square have Si n and s ou t 
on the same caustic fold, which cannot occur due to the 
concave geometry of the folds. This means that the anti- 
diagonal sub-boxes must have zero probability. There are 
thus N c (N c — 1) sub-boxes to consider for each caustic. 

The event timescale prior tt (tE) can in principle be ob- 
tained by considering models of microlensing in the Galaxy, 
mapping the joint distribution of lens mass, lens and source 
distances, and relative prop er motion onto the correspond- 
ing distribution of t E (e.g. iDominikl [2006) . A convenient 
alternative is to use the observed distribution of iE, e.g. 
from the OGLE survey. Caution is needed because of possi- 
ble biases in fitting t E to observed lightcurves, and selection 
effects lowerin g the occurrence of short and long t E events 
in the survey. ICassan et all (|2010f ) considered two different 
priors on t E to illustrate their effect on n (si n , s ou t). One 
was a distribution of event timescales observed in past mi- 
cr olensing seasons, and another was the model distribution 
of IWood fc Maol (|2005l l . These two distributions were shown 
to be in excellent agreement with each other. 

In this paper we derive a 2-dimensional joint prior on 
the event timescale and source size using simulations of syn- 
thetic stellar popu lation obtained with the Besangon model 
(|Robin et alj|2003l ). We briefly describe our method to de- 
rive this in the next section. 
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4- 1.1 Deriving priors from a Galactic model 

The initial maximum likelihood analysis of the m icrolens- 
ing event OGLE-2007-BLG-472 (|Kains et al.ll2009l ) suggests 
an unusual parameter combination as best description of 
the data. The strength and plausibility of such an ap- 
proach can be tested by using a Galactic model that reflects 
our prior knowledge of the Galactic structure. For inter- 
preting microlensing lightcurves, different Galactic models 
jHan fc Gouidlll995l . iBennett fc Rhiell2002l . IPominikllioO^ ') 
are in use. These models differ in details of the assumed 
spatial, kinematic, and mass distributions of the Galac- 
tic Bulge and Disk stellar populations. A different model 
which is adapted to reflect the observed star counts in the 
optical and near-in frared is the so-ca lled Besanqon mode l 
jRobin et all 120031 ). As indicated by iKerins et all |2009l) . 
this model can be used to predict the optical depth of grav- 
itational microlensing events. Moreover it can be used to 
used to set detailed parameter constraints when combined 
with the adapted parameter estimates and the source star 
properties. 

Based on the available online catalogue simulation, we 
generated a sample of stars between Earth and 11 kpc. To 
ensure that potential lenses, which are typically faint, are in- 
cluded, the apparent magnitude was not constrained. Base d 
on the value used in the previous paper (jKains et al.ll2009h . 
we assumed a visual extinction Av = 0.7 mag kpc -1 , where 
the resulting model extinction curve stops increasing after 
several kpc. For a more ac curate descript i on, th e calibrated 
spatial extinction in K s ^Marshall et all 120061 ) could have 
been used, but this would have required a calibration for 
the I band, which is typically used in microlensing observa- 
tions. 

In order to infer microlensing distributions from the 
simulation, lens-source pairs were randomly drawn from the 
sample. These were then accepted or rejected depending on 
the area of their corresponding angular Einstein ring, which 
gives the instantaneous lensing probability. The simulated 
bolometric magnitude and effective temperature allowed us 
to estimate p,. Including the simulated proper motion pro- 
vided us with an estimate for the Einstein time - the only 
observable parameter directly connected to the lens mass. 
We did not include the lens-source relative proper motion in 
the resampling procedure, but this would lead to a distribu- 
tion that favours shorter Einstein times, as fast lenses lead 
to larger detection zones on the sky. This is a consequence of 
moving the lensing cross-section of the instantaneous lens- 
ing probability along the lens-source relative proper motion. 
The correction depends on the annual survey observation 
and the survey efficiency in tE. For events much longer than 
the sampling rate, the increase in lensing probability can 
be modelled as a stripe on the sky. For a coverage of 240 
days which we assume here, the actual prior distribution of 
the event duration changes its expected value by an amount 
that is negligible in comparison to the error ellipse of instan- 
taneous case. 

Our es t imate s illustrate that the value found by 
iKains et all (|200Sh for ts is much larger and that of p* 
much smaller than typical samples drawn from the Besancon 
model. Consequently, we determine a bivariate Gaussian 
prior based on the covariance matrix of <e and angular 
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Figure 3. Contour plot of the the logarithm of our joint prior 
on p* and <e- The location of the best-fit models identified for 
event OGLE-2007-BLG-472 are shown with white filled circles 
and labelled for the different statistics we use, as discussed in the 
text. 

source star radius of the simulated sample. This joint prior 
is plotted on Fig. [3] 

Since trajectories requiring very large values of ts and/ 
or very small p* are suppressed by this prior, so too are the 
corresponding regions of the (si n , s ou t) plane. That is, for 
given values of ii n and tout, regions of (si n ,s out ) where the 
source enters and exits the caustic structure very close to the 
same cusp are suppressed. This is evident on the n (si n , s ou t) 
maps, e.g. the bottom panel of Fig. [2] where the prior is low 
in the corners al ong the anti-d iagonal line. 

Because the lCass"anl (|2008l ) parameters assume that the 
source trajectory crosses a caustic, and because we are com- 
paring caustics of different sizes as we move across the (d, q) 
plane, a full implementation of the uniform isotropic prior on 
source-lens trajectories must account for large caustics being 
easier to hit than small ones. If two models have equal \ 2 
but cross caustics of different sizes, the prior should favour 
the model with a larger probability of being hit. As each 
( Sin, Sout) corresponds to a different source trajectory angle 
q, we quantify this by defining nu(d,q,a), the probability 
that a caustic will be "hit" by a trajectory with angle a. 
This is proportional to the range of impact parameters in- 
tersecting the caustic, i.e. the projected size of the caustic 
perpendicular to the source trajectory. The concave struc- 
ture of the caustic means that once the N c cusp positions 
are found, and rotated by an angle —a, the vertical range 
the n gives the pro j ected cross-section of the caustic. Thus if 
the ICassan et all l|2010t ) prior 7r(sin,s out ) is normalised to 
1 when integrated over the (si n ,s ou t) square, the full prior 
multiplies this by irn(d, q, a). 

4.2 Automated modelling scheme 

The flowchart in Fig. [4] summarises the main steps of our 
automated modelling scheme. In summary, 

(i) For each node in the (d, q) grid, we construct the cor- 
responding caustic curves. 

(ii) For each caustic curve, we construct the 7r(si n ,s ut) 
prior map, which divides into sub-boxes. 

(iii) In each sub-box, we launch an MCMC run on the /3 
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parameters to find the best fit and map out the posterior 
P(j3\D, d, q, box). Chains are kept confined to each sub-box 
by forcing MCMC steps to remain within its boundaries. 

The results are then collected to construct the posterior 
probability P(d,q\D), either by optimising the nuisance pa- 
rameters or by integrating over the nuisance parameters in 
each sub-box, and then summing over the sub-boxes. Finally, 
we compute the corresponding "Badness-of-Fit" statistic 
BoF(d,q) = -2 In P{d,q\D). 

Our automated modelling scheme exploits the structure 
of the prior n (si U , s ut)- For a caustic with N c cusps, the 
prior 7r (sin, s out ) has N c (N c — 1) local maxima, one in each 
of the sub-boxes. As the separation between these local max- 
ima can be large, a single MCMC run, or whichever other 
parameter optimisation method is employed, may find it dif- 
ficult to jump from one sub-box to another, and thus may fail 
to find the best solution. To avoid this, we launch an MCMC 
run in each of the sub-boxes. Thus we start N C (N C — 1) 
chains, each confined to a particular sub-box, to locate the 
best fit in each sub-b ox. We s t op th e chains using the con- 
vergence criterion of iGewekd (| 19921 ) once they are past a 
minimum number of iterations. 

Our method thus divides the binary-lens parameter 
space not only into a (d, q) grid, but further into the required 
sub-boxes for each caustic for each (d, q) pair. In each sub- 
box we start the MCMC run at the maximum of -n (s- m , s ou t). 
Instead of using x 2 as the sole criterion for acceptance or re- 
jection of proposed MCMC steps, the ratio of the priors is 
also taken into account. 

To incorporate non-uniform priors n (9) in the MCMC 
algorithm, we simply modify the criterion for accepting a 
proposed step. Rather than only the likelihood P(D\8), we 
consider the full posterior P(6>|L>) oc P(D\6) ty (9). A pro- 
posal to take a random step from 9 to 9' is always accepted 
if 9' increases the posterior, and the acceptance probabil- 
ity when 8' diminishes the posterior is the ratio of posterior 
probabilities 



P(0'\D) 



= exp 



(11) 



P(9\D) 

where Ax 2 = X 2 {@') ~ X 2 (@) i s t ne difference in x 2 across the 
proposed step. For an MCMC run over the full parameter 
set 9, the relevant prior is 



7r(0) = 7r(d,g) ir(J3\d,q) 



(12) 



For MCMC runs over the nuisance parameters /3, with fixed 
d, q, the relevant prior is 



7r {f)\d, q) oc 7r (s in , s out ) 7T H (d, q, a) 



(13) 



4.3 Implementation 



We implemented the algorithm by using a cluster of desktop 
computers, each one running one of the MCMC chains to 
map out the posterior P(/3\D, d, q, box) for a grid of (d,q) 
values and for all the corresponding sub-boxes. The results 
are then collected to construct the posterior probability map 
P(d,q\D), integrating over the nuisance parameters /3, and 
summing over the sub-boxes. 

P(d, q\D, box) is evaluated from each MCMC chain us- 
ing the best-fit parameters /3: 



box 



P(d, q\D, box) tx P(D\d, q, box, (3) n (j3\d, q, 

(14) 

Because each sub-box has its own MCMC chain, we must 
weight the chain averages by the prior probability of each 
sub-box: 

7r(box|d, q) = JJ 71 ( Sin > s ° ut ) 7111 ds in ds out , (15) 

where the integration limits cover the sub-box. The weighted 
sum of chain averages then gives the posterior (d, q) map, 

P{d,q\D) = ^7r(box|d, 9 ) P(d,q|D,box) . (16) 



Normally one sub-box dominates the sum, but sometimes 
two or more can contribute. 



5 RESULTS 

5.1 Badness-of-Fit Criteria 

We consider and compare results for four alternative 
"Badness-of-Fit" criteria, corresponding to maximum likeli- 
hood (ML), maximum a-posteriori (MAP), and the Bayesian 
Information Criterion (BIC), as well as a Bayes statistic that 
integrates the posterior probability over the nuisance param- 
eters. In each case the best-fit parameters (d, q) minimise 
a "Badness-of-Fit" statistic, BoF(d, q), and the correspond- 
ing posterior probability is P(d, q\D) oc exp { — ^ BoF(d, q)}. 
Figs |6]|8] display the BoF(d, q) maps obtained for the four 
cases: 



ML : 


BoF 


= X 2 (/?) 


MAP : 


BoF 


= X 2 (/3) 


BIC : 


BoF 


= X 2 (/3) 


Bayes : 


BoF 


= x 2 -: 



Here the prior n is from Eqn. H2l or Eqn. 1131 depending 
on the context. We briefly elaborate on the three options 
below before discussing the results. 

• ML: The maximum likelihood (ML) parameters max- 
imise the likelihood L(d,q) = P(D\d, q), equivalent to min- 
imising BoF= — 21n(L) = x' '■ Thus we determine the best- 
fit value of x 2 f° r each (d,q) pair, and let P(d, q\D) oc 
exp (— x 2 /2). This approach emphasises the fit to the data 
while disregarding priors on the parameters. 

• MAP: The maximum a-posteriori (MAP) parameters 
maximise the posterior probability density P(d, q\D), equiv- 
alent to minimising BoF(d, q) = — 2 In P(d, q\D). 

• BIC: The Bayesian Information Criterion (BIC), ap- 
plies an "Occam" penalty that gives priority to "simpler" 
models that employ fewer parameters to achieve their fit. 
Each (d, q) grid point is regarded as a competing model 
with equal prior probability and N e g effective parameters 
that have been optimised. The Akaike I nformation C riterion 
(AIC) uses an Occam penalty 2 N c a (| Akaikel [l974l 'l . while 
the Bayesian Information Criterion (BIC) uses a stronger 
penalty \n(Np ) N e ff, with No the number of data points 
(|Schwardll978h . Our tests with fitting of polynomial models 
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suggest that the BIC may be more reliable than the AIC for 
model selection. 

We use the MCMC samples to estimate the "effective 
number" of nuisance parameters, 

N cH * (D(d)) - D((9}) , (17) 

where (x) denotes the expectation value of x under the pos- 
terior, simply calculated by taking an unweighted average 
over the MCMC samples, and the 'deviance' is 

D(0)s X 2 -2 1n 7r , (18) 

as used to compute the acceptance probability of each step 
in the MCMC algorithm, as per Eq. (fTT]). Here D({0)) es- 
timates the deviance at the minimum, while (D(6)} mea- 
sures the typical value, which should rise by 1 for each di- 
mension of the parameter space explored by the MCMC 
samples. This definition of AT e g is designed to avoid double- 
counting when two parameters are highly correlated, and 
is found in the Devia nce Info rmation C riterion (DIC, 
ISpiegelhalter et alJliool . see also lAnddl2007f ). 

• Bayes: A fully Bayesian approach integrates the pos- 
terior probability density over the m nuisance parameters 
/?, rather than just finding the maximum likelihood (ML) or 
maximum posterior probability density (MAP). Thus if two 
models have the same MAP statistic, the one that achieves 
that good fit over a wider range of parameters has a corre- 
spondingly higher probability. 

We can also write out the Bayes statistics as 



BoF = X 2 (/3) - 2 In n0) - ]T In (2 n , (19) 

i=i 

where the 2 n factor here refers to the constant n = 
3.141592... rather than the prior 7r(/3), and Ai(/3) are the 
m eigenvalues of the parameter-parameter covariance ma- 
trix evaluated at /3, their product being the m-dimensional 
parameter volume admitted by the data around the best-fit 
value p. 

We approximate the integral over the m nuisance param- 
eters by the method of steepest descents, 

J e-* 2 ^)/ 2 vr (/?) d m p « e -* 2 («/ 2 tt0) d m p (20) 

m , ,„ 

^e^ 2 ^,0)]j(2nX l 0)) , (21) 

i=i 

where the 2 7r factor here again refers to the constant rather 
than the prior. This is just the MAP statistic multiplied 
by a parameter space volume. We evaluate parameter space 
volume d m /3 as the square root of the determinant of the 
parameter-parameter covariance matrix derived from the 
MCMC chain. 

5.2 Fits to synthetic data 

For the synthetic event there is a single narrow, well-defined 
minimum. Table [2] summarises the parameters of the best- 
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Parameter 




Units 


d (grid) 


1.237 




q (grid) 


0.059 




9 = F B /F S 


5.81 ± 0.09 




x 2 


202.9 




"Standard" 


to 


5503.62 ±0.014 


MHJD 


*E 


37.52 ± 0.04 


days 


a 


1.692 ± 0.005 


rad 


"0 


0.056 ± 0.003 




P* 


(2.44 ±0.26) x 10~ 3 




"Caustic" 


tin 


5500.394 ± 0.009 


MHJD 


^out 


5507.342 ± 0.004 


MHJD 


Sin 


1.273 ± 0.002 




^out 


0.706 ± 0.002 




At cc 


0.092 ± 0.010 


days 



Table 2. Best- fit parameters (from d,q grid exploration) for the 
synthetic event. 

fit model found with a 14 x 14 (d, q) grid, evenly spaced in 
logd and log (jr. The posterior distribution P(d,q\D) found 
using the MAP option is plotted in Fig. [S] the ML and 
BIC posterior maps are almost undistinguishable. The BoF 
minimum is so tightly defined that the choice of BoF statistic 
has little effect on the best-fit parameters or the shape of the 
posterior map. 

The fit that is recovered it located at the grid point 
closest to the true model, as can be seen by comparing Fig. [5] 
to Fig. [2] and the best-fit parameters given in Table [2] to 
Table [T] The true parameters are not exactly recovered, as 
they do not match our grid points, but another modelling 
could be conducted without keeping d and q fixed, using the 
best models for each grid point as a starting point for new 
MCMC runs. 

5.3 Fits to OGLE-2007-BLG-472 data 

Our fits to the OGLE-2007-BLG-472 data are presented in 
FigsEE The contour levels are set at ABoF=2.3, 6.17, 11.8, 
20, 50, 100, 250 and 500, relative to the global minimum, 
the first 3 thus corresponding to 1, 2, and 3-<r confidence re- 
gions if the posterior is well approximated by a 2-parameter 
Gaussian. The best-fit values and uncertainties of additional 
parameters are summarised in Table [3] 

Fig. [6] exhibits an extended region of low x 2 around the 
minimum at d — 0.51 and q — 2 x 10~ 4 . The width in d is 
unresolved by the rather coarse (d, g) grid, and the extension 
in logq is around 1 dex. The best-fit model has the source 
crossing a very small planetary caustic, requiring a very long 
event timescale, £e ~ 2000 d, to match the observed cross- 
in gs at ti n and t out se parated by 3 d. Thus, as was also found 
m iKains et af] (|2009f ). the lowest-x 2 model for this event is 
not very well constrained, and has an implausibly long tE- 
There are no significantly different competing local minima 
with A\ 2 < 20; the first competitive model for which the 
configuration (source trajectory and location of the caustic 
crossings) is significantly different has A\ 2 ~ 22. 



Changing the BoF statistic has a significant effect on the 
posterior map: the penalties introduced by the prior move 
the best-fit model "up" in q, towards models with smaller 
tE and configurations where the source crosses a central, 
rather than planetary, caustic. The MAP, BIC and Bayes 
fits (Fig. 0-1} all favour a model with q ~ 0.12, d ~ 0.61. 

The x 2 increases by Ax 2 = 43 relative to the global x 2 
minimum, but the priors compensate since tE and p* both 
move toward more plausible values, and the larger caustic 
is easier to hit. Fig. [3] shows the location of the lowest-x 2 
and best Bayesian models with respect to the 7r(p*,tE) con- 
tour. The x 2 model is in the wings of the prior distribution, 
whereas the best-BIC model is near its peak, meaning that 
the x 2 model is more strongly penalised by the prior. 

We used the best-fit Bayesia n parameters (Table [3} and 
the algorithm of iDominikl (|2006f ) to derive probability dis- 
tributions for the lens mass and distance as shown in Fig. [9] 
With no parallax signal detected in this event, we used only 
the constraint from tE and p*. We find lens component 
masses of 0.78±q -f 7 M© and O.lltoog M s at a distance of 
5.88j££ kpc. 

The best Bayesian model has a large blend/source flux 
ratio Fb/Fs ~ 200. There is no obvious star blended with 
the lens on the images, and the blending could plausibly 
come from the binary-star lens system, or from a third body. 



6 DISCUSSION AND CONCLUSIONS 

The modelling results for the two datasets presented here 
indicate that our algorithm is successful in locating min- 
ima throughout the parameter space, and the subdivision of 
the prior maps ensures that all possible source trajectories 
through the caustics are explored. Furthermore, the use of 
Bayesian priors allows us to incorporate information on the 
event timescale distribution, as well geometrical information 
on the concavity of caustics. 

Although the sampling rate for our synthetic lightcurve 
data is not particularly high compared to what can now be 
achieved by survey and follow-up teams, our algorithm lo- 
cated a well-defined minimum near the true minimum, with 
a grid search of the (d, q) parameter space and MCMC runs 
to sample the posterior probability in the region around each 
local minimum. 

In our re-analysis of the OGLE-2007-BLG- 472 data, we 
impro ve upon the posterior map calculated in IKains et al.l 
(2009) for OGLE-2007-BLG-472 because we now use an 
MCMC run for each prior sub-box separately rather than 
just a single one per (d, q) grid point. We find that changing 
the badness-of-fit statistic leads to important changes in the 
posterior P(d, q\D) maps. In particular, the model with low- 
est x 2 has a planetary mass ratio and an implausibly long 
tE ~ 2000 d. Adding priors dramatically shifts the location 
of the best-fit model, lowering the timescale to ts ~ 70 d. 
Using a Bayesian approach to penalise models with improb- 
able parameters leads to best-fit parameters corresponding 
a binary star lens with 0.78 and 0.12 Mq components at a 
distance of ~ 5.9 kpc, and a more typical event timescale 
tE ~ 70 d. The only remarkable parameter is a rather high 
blending fraction, which could arise from either the lens it- 
self or a closely blended thi rd star. The new m odel is very 
different from that found bv lKains et~ai] (|2009l ). which char- 
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Figure 5. MAP fit to the synthetic lightcurve data, using BoF= x 2 0)-2 law0). Top: Posterior maps P(d, q\D) for the source crossing 
a central (left) and secondary (right) caustic. Contour levels are at ABoF = 2.3,6.17,11.8,20,50,100,250. The model with the lowest 
BoF is marked with a yellow filled circle and that of the true model with a filled blue triangle. Bottom: The data and best-fit model 
lightcurve, with an inset showing the source trajectory crossing the caustic (left), and the location (yellow filled circle) of the best-fit 
model on the prior map P(s ln , s ou t), along with MCMC samples (red circles) and local best-fit minima in each sub-box, indicated by 
yellow circles (right). 



acterised the lens as a binary star with components masses 
of 1.50 and 0.12M Q at a distance of 1 kpc. 

The development of automated algorithms for real-time 
modelling such as that presented here allows observers to 
receive feedback on ongoing anomalous microlensing events, 
and ensure that important features predicted by real-time 
modelling are not missed. This makes it much easier to assess 
the nature of the lensing system more rapidly and allocate 
observing time to targets more effectively. When observa- 
tional coverage is not complete, or when the \ 2 alone is not 
sufficient as a criterion for badness-of-fit, statistics like the 
those we use in this paper could help to assess reliably alter- 
native models. Furthermore, provided that the chosen priors 
are appropriate, comparing the resulting posterior maps of 
using different statistics allows for a useful test of a given 
model's robustness. 
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Figure 7. Same as Fig. [6] but for a fully Bayesian fit using BoF= x 1 —1 ln(7r (/3) d m f}), thus augmenting the likelihood with suitable 
Bayesian priors on the parameters and taking into account the effective number of parameters, as detailed in the text. Corresponding 
figures for a MAP and BIC fit are shown on Fig. [8] for comparison. 




Figure 9. Probability density distributions for the lens mass (left) and distance (right) using the best-fit parameters found by minimising 
the BIC, computed using the algorithm of Dominik (2006). The distance distribution assumes a source distance Dg = 8.5kpc. Dashed 
lines and dotted lines respectively denote the median value and the limits of the 68.3% confidence interval. 
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Parameter 


ML (Fig. [6) 


MAP /BIC /Bayes (Fig.0 


Units 


ML (x 2 ) 


915 


958 




MAP 


1017 


968 




BIC 


1046 


986 




Bayes 


1061 


1003 




d (grid) 


0.51 


0.61 




q (grid) 


2.03 x 10~ 4 


0.119 




9 = F B /F S 


7.59 ± 0.08 


114.59 ±9.62 




(OGLE) 








AW 


4.20 


4.97 




Standard 


to 


7121.28 ± 113.61 


4332.41 ±0.25 


MHJD 


'e 


1939.35 ± 80.92 


73.37 ±5.45 


day 


Q 


3.134 ±0.044 


3.050 ± 0.020 


rad 


itrj 


-0.181 ±0.029 


-0.052 ± 0.003 




P* 


(3.09 ±0.37) x 10~ 5 


(5.66 ±0.51) x 10~ 4 




"Caustic" 


tin 


31.379 ± 0.012 


31.325 ± 0.016 


MHJD-4300 


tout 


34.078 ± 0.002 


34.077 ± 0.002 


MHJD-4300 


S i n 


1.785 ± 0.012 


0.807 ± 0.005 




^out 


1.011 ± 0.013 


0.423 ± 0.011 




Ai cc 


0.073 ± 0.003 


0.072 ± 0.004 


day 


I, 


17.95 


20.77 


mag 


h 


15.74 


15.62 


mag 


9, 


1.15 


0.53 


/^as 



Table 3. Best-fit standard (top) and "caustic" (middle) parameters for OGLE-2007-BLG-472, as well as source properties and blend 
magnitude (bottom), for the ML and BIC statistics. The values of all 4 BoF statistics are given for both models for informative purposes. 
d and q are fixed since these are grid models. Source angular radii are computed using the same colour as in the previous paper on this 
event (Kains et al. 2009). 



